Using MMIL for the High Level Semantic Annotation of the French MEDIA Dialogue Corpus

نویسندگان

Lina Maria Rojas-Barahona

Thierry Bazillon

Matthieu Quignard

Fabrice Lefèvre

چکیده

The MultiModal Interface Language formalism (MMIL) has been selected as the High Level Semantic (HLS) formalism for annotating the French MEDIA dialogue corpus. This corpus is composed of human-machine dialogues in the domain of hotel reservation and tourist information. Utterances in dialogues have been previously annotated with a concept-value flat semantics for studying and evaluating spoken language understanding modules in dialogue systems. We are now interested in investigating the use of more complex representations to improve the understanding capability. The MMIL intermediate language is a high level semantic formalism that bears relevant linguistic information, from syntax up to discourse. This representation should increase the expressivity of the current annotation though at the expense of the annotation process complexity. In this paper we present our first attempt in defining the annotation guidelines for the HLS annotation of the MEDIA corpus and its effect on the annotation process itself, revealed by annotators’ disagreements due to the different levels of hierarchy and the granularity of the features defined in MMIL.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending MMIL Semantic Representation: Experiments in Dialogue Systems and Semantic Annotation of Corpora

The MultiModal Interface Language formalism (MMIL) is a modalityindependent high-level semantic representation language. It has been used in different projects, related to different domains, and with distinct tasks and interaction modes. MMIL is a metamodel that enables the definition of generic and domain specific descriptors to dialogue management, offering flexibility and high reusability. T...

متن کامل

Semantic Frame Annotation on the French MEDIA corpus

This paper introduces a knowledge representation formalism used for annotation of the French MEDIA dialogue corpus in terms of high level semantic structures. The semantic annotation, worked out according to the Berkeley FrameNet paradigm, is incremental and partially automated. We describe an automatic interpretation process for composing semantic structures from basic semantic constituents us...

متن کامل

An Incremental Architecture for the Semantic Annotation of Dialogue Corpora with High-Level Structures. A case of study for the MEDIA corpus

The semantic annotation of dialogue corpora permits building efficient language understanding applications for supporting enjoyable and effective human-machine interactions. Nevertheless, the annotation process could be costly, time-consuming and complicated, particularly the more expressive is the semantic formalism. In this work, we propose a bootstrapping architecture for the semantic annota...

متن کامل

Portability of Semantic Annotations for Fast Development of Dialogue Corpora

Generalization of spoken dialogue systems increases the need for fast development of spoken language understanding modules for semantic tagging of speaker’s turns. Statistical methods are performing well for this task but require large corpora to be trained. Collecting such corpora is expensive in time and human expertise. In this paper we propose a semi-automatic annotation process for fast pr...

متن کامل

Leveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORTMEDIA corpora

The PORTMEDIA project is intended to develop new corpora for the evaluation of spoken language understanding systems. The newly collected data are in the field of human-machine dialogue systems for tourist information in French in line with the MEDIA corpus. Transcriptions and semantic annotations, obtained by low-cost procedures, are provided to allow a thorough evaluation of the systems’ capa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Using MMIL for the High Level Semantic Annotation of the French MEDIA Dialogue Corpus

نویسندگان

چکیده

منابع مشابه

Extending MMIL Semantic Representation: Experiments in Dialogue Systems and Semantic Annotation of Corpora

Semantic Frame Annotation on the French MEDIA corpus

An Incremental Architecture for the Semantic Annotation of Dialogue Corpora with High-Level Structures. A case of study for the MEDIA corpus

Portability of Semantic Annotations for Fast Development of Dialogue Corpora

Leveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORTMEDIA corpora

عنوان ژورنال:

اشتراک گذاری